\title{Laplace Approximation}

\subsection{Laplace Approximation}

(This tutorial follows the
\href{/tutorials/map}{Maximum a posteriori estimation} tutorial.)

Maximum a posteriori (MAP) estimation approximates the posterior $p(\mathbf{z} \mid \mathbf{x})$
with a point mass (delta function) by simply capturing its mode. MAP is
attractive because it is fast and efficient. How can we use MAP to construct a
better approximation to the posterior?

The Laplace approximation
\citep{laplace1986memoir}
is one way of improving a MAP estimate. The idea
is to approximate the posterior with a normal distribution centered at the MAP
estimate,
\begin{align*}
  p(\mathbf{z} \mid \mathbf{x})
  &\approx
  \text{Normal}(\mathbf{z}\;;\; \mathbf{z}_\text{MAP}, \Lambda^{-1}).
\end{align*}
This requires computing a precision matrix $\Lambda$. Derived from a
Taylor expansion, the Laplace approximation uses the Hessian of the
negative log joint density at the MAP estimate.
It is defined component-wise as
\begin{align*}
  \Lambda_{ij}
  &=
  \frac{\partial^2}{\partial z_i \partial z_j} -\log p(\mathbf{x}, \mathbf{z}).
\end{align*}
For flat priors (which reduces MAP to maximum likelihood), the
precision matrix is known as the observed Fisher information
\citep{fisher1925theory}.
Edward uses TensorFlow's automatic differentiation, making this
second-order gradient computation both simple and efficient to
distribute.

For more details, see the \href{/api/}{API} as well as its
implementation in Edward's code base.

\subsubsection{References}\label{references}
